The amount of graph-structured data has recently experienced an enormousgrowth in many applications. To transform such data into useful information,fast analytics algorithms and software tools are necessary. One common graphanalytics kernel is disjoint community detection (or graph clustering). Despiteextensive research on heuristic solvers for this task, only few parallel codesexist, although parallelism will be necessary to scale to the data volume ofreal-world applications. We address the deficit in computing capability by aflexible and extensible community detection framework with shared-memoryparallelism. Within this framework we design and implement efficient parallelcommunity detection heuristics: A parallel label propagation scheme; the firstlarge-scale parallelization of the well-known Louvain method, as well as anextension of the method adding refinement; and an ensemble scheme combining theabove. In extensive experiments driven by the algorithm engineering paradigm,we identify the most successful parameters and combinations of thesealgorithms. We also compare our implementations with state-of-the-artcompetitors. The processing rate of our fastest algorithm often reaches 50Medges/second. We recommend the parallel Louvain method and our variant withrefinement as both qualitatively strong and fast. Our methods are suitable formassive data sets with billions of edges.
展开▼